William Gann's Hyperparameter Tuning of Feature Engineering Pipelines using GridSearchCV

Hyperparameter tuning is the process of finding the optimal hyperparameters for a machine learning model. In the context of feature engineering pipelines, this includes tuning the parameters of the transformers, such as the window size of a moving average or the number of components in a PCA.

Scikit-Learn's GridSearchCV provides a simple and effective way to perform hyperparameter tuning.

`GridSearchCV` and `RandomizedSearchCV`

GridSearchCV performs an exhaustive search over a specified parameter grid. RandomizedSearchCV, on the other hand, performs a randomized search over a specified parameter distribution. RandomizedSearchCV is often preferred for large parameter spaces, as it is more computationally efficient.

python

from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

# Assume pipeline is our feature engineering pipeline
parameters = {
    'moving_average__short_window': [10, 20, 50],
    'moving_average__long_window': [100, 200, 300],
}

grid_search = GridSearchCV(pipeline, parameters, cv=5)
grid_search.fit(X, y)

from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

# Assume pipeline is our feature engineering pipeline
parameters = {
    'moving_average__short_window': [10, 20, 50],
    'moving_average__long_window': [100, 200, 300],
}

grid_search = GridSearchCV(pipeline, parameters, cv=5)
grid_search.fit(X, y)

Cross-Validation Strategies for Financial Time Series

Standard cross-validation techniques, such as k-fold cross-validation, are not suitable for financial time series data, as they can lead to data leakage. Instead, we need to use cross-validation strategies that are specifically designed for time series data, such as TimeSeriesSplit.

python

from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
grid_search = GridSearchCV(pipeline, parameters, cv=tscv)

from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
grid_search = GridSearchCV(pipeline, parameters, cv=tscv)

Mathematical Formulation: Cross-Validation

The goal of cross-validation is to estimate the generalization error of a model. The average of the errors over the k-folds is used as the estimate of the generalization error.

$E_{cv} = rac{1}{k} \sum_{i=1}^{k} E_i$

Where:

$E_{cv}$ is the cross-validation error.
$k$ is the number of folds.
$E_i$ is the error on the i-th fold.

Parameter	Value
short_window	20
long_window	200
score	0.75

By tuning the hyperparameters of your feature engineering pipeline, you can significantly improve the performance of your trading models.

Category	William Gann
Read time	5 minutes
Published	Feb 28, 2026

William Gann's Hyperparameter Tuning of Feature Engineering Pipelines using GridSearchCV

The Black Book of Day Trading Strategies

GridSearchCV and RandomizedSearchCV

Cross-Validation Strategies for Financial Time Series

Mathematical Formulation: Cross-Validation

`GridSearchCV` and `RandomizedSearchCV`